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Abstract. An evolutionary tree is a rooted tree where each internal vertex has at least two 
children and where the leaves are labeled with distinct symbols representing species. Evolutionary 
trees are useful for modeling the evolutionary history of species. An agreement subtree of two 
evolutionary trees is an evolutionary tree which is also a topological subtree of the two given trees. 
We give an algorithm to determine the largest possible number of leaves in any agreement subtree 
of two trees T\ and T2 with n leaves each. If the maximum degree d of these trees is bounded by a 
constant, the time complexity is 0(n log 2 n) and is within a logn factor of optimal. For general d, 
this algorithm runs in 0(nd 2 log d log 2 n) time or alternatively in 0(nd\/d log 3 n) time. 
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1. Introduction. An evolutionary tree is a rooted tree where each internal ver- 
tex has at least two children and where the leaves are labeled with distinct symbols 
representing species. Evolutionary trees are useful for modeling the evolutionary his- 
tory of species. Many mathematical biologists and computer scientists have been 
investigating how to construct and compare evolutionary trees \% J^, |?], [h| [n], [lj, 
0, H I], H, || H H HI H H H [I HI HI B An agreement 
subtree of two evolutionary trees is an evolutionary tree which is also a topological 
subtree of the two given trees. A maximum agreement subtree is one with the largest 
possible number of leaves. Different theories about the evolutionary history of the 
same species often result in different evolutionary trees. A fundamental problem in 
computational biology is to determine how much two theories have in common. To 
a certain extent, this problem can be answered by computing a maximum agreement 
subtree of two given evolutionary trees J]l| . 

Let T\ and T2 be two evolutionary trees with n leaves each. Let d be the maxi- 
mum degree of these trees. Previously, Kubicka, Kubicki and McMorris |?9) gave an 
algorithm that can compute the number of leaves in a maximum agreement subtree 
of Ti and T 2 in 0(n ( -i +e '> lo s n ) time for d = 2. Steel and Warnow (47| gave the first 
polynomial-time algorithm. Their algorithm runs in 0(min{rf!n 2 , d 25 n 2 log n}) time if 
d is bounded by a constant and in 0(n 4 5 log n) time for general trees. Farach and Tho- 
rup (lj] later reduced the time complexity of this algorithm to 0(n 2 ) for general trees. 
More recently, they gave an algorithm |l5[ that runs in 0(n 15 logn) time for general 

trees. If d is bounded by a constant, this algorithm runs in 0{nc^ l ° sn + n\/d\ogn) 
time for some constant c > 1. 

This paper presents an algorithm for computing a maximum agreement subtree 
in 0(n log 2 n) time for d bounded by a constant. Since there is a lower bound of 
f2(nlogn), our algorithm is within a logn factor of optimal. For general d 7 this 
algorithm runs in 0(nd 2 log d log 2 n) time or alternatively in Oind^fd log 3 n) time. 
This algorithm employs new tree contraction techniques jy, [38| . With tree 

contraction, we can immediately obtain an 0(n log 5 n)-time algorithm for d bounded 
by a constant. Reducing the time bound to 0(n log 2 n) requires additional techniques. 
We develop new results that are useful for bounding the time complexity of tree 
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contraction algorithms. As in jlj, [15|, |47|| , we also explore the dynamic programming 
structure of the problem. We obtain some highly regular structural properties and 
combine these properties with the tree contraction techniques to reduce the time 
bound by a factor of log 2 n. To remove the last log n factor, we incorporate some 
techniques that can compute maxima of multiple sets of sequences at multiple points, 
where the input sequences are in a compressed format. 

We present tree contraction techniques in §^ and outline our algorithms in §|^. 
The maximum agreement subtree problem is solved in §|| and §|] with a discussion of 
condensed sequence techniques in §5.1. Section §|6| concludes this paper with an open 
problem. 

2. New tree contraction techniques. Throughout this paper, all trees are 
rooted ones, and every nonempty tree path is a vertex-simple one from a vertex to a 
descendant. For a tree T and a vertex u, let T u denote the subtree of T formed by u 
and all its descendants in T. 

A key idea of our dynamic programming approach is to partition T% and T 2 into 
well-structured tree paths. We recursively solve our problem for Tf and for all 
heads x and y of the tree paths in the partitions of T\ and Ta, respectively. The 
partitioning is based on new tree contraction techniques developed in this section. 

A tree is homeomorphic if every internal vertex of that tree has at least two 
children. Note that the size of a homeomorphic tree is less than twice its number of 
leaves. Let S be a tree that may or may not be homeomorphic. A chain of S is a 
tree path in S such that every vertex of the given path has at most one child in S. 
A tube of S is a maximal chain of S. A root path of a tree is a tree path whose head 
is the root of that tree; similarly, a leaf path is one ending at a leaf. A leaf tube of 
S is a tube that is also a leaf path. Let C(S) denote the set of leaf tubes in S. Let 
TZ(S) = S — £(S), i.e., the subtree of S obtained by deleting from S all its leaf tubes. 
The operation 1Z is called the rake operation. See Figures |l| and ^ for examples of 
rakes and leaf tubes. 

Our dynamic programming approach iteratively rakes T\ and T 2 until they become 
empty. The tubes obtained in the process form the desired partitions of T\ and T 2 . 
Our rake-based algorithms focus on certain sets of tubes described here. A tube system 
of a tree T is a set of nonempty tree paths Pi, ■ • • , P m in T such that (1) the paths 
Pi contain no leaves of T and (2) T hl , ■ ■ ■ , T hm are pairwise disjoint, where hi is the 
head of Pj. Condition (1) is required here because our rake-based algorithms process 
leaves and non-leaf vertices differently. Condition (2) holds if and only if for all i and 
j, hi is not an ancestor or descendant of hj. We can iteratively rake T to obtain tube 
systems. The set of tubes obtained by the first rake, i.e., £(T), is not a tube system 
of T because C(T) simply consists of the leaves of T and thus violates Condition 
(1). Every further rake produces a tube system of T until T is raked to emtpy. Our 
rake-based algorithms only use these systems although there may be others. 

We next develop a theorem to bound the time complexities of rake-based algo- 
rithms in this paper. For a tree path P in a tree T, 

• K(P, T) denotes the set of children of P's vertices in T, excluding P's vertices; 

• i(P) denotes the number of vertices in P; 

• b(P, T) denotes the number of leaves in T h where h is the head of P. 
(The symbol K stands for the word kids, t for top, and b for bottom.) 

Given T, we recursively define a mapping $t from the subtrees S of T to reals. 




After the third rake, the above tree becomes empty. 

Fig. 1. An example of iterative applications of rakes. 
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The third rake deletes the above leaf tube. 

Fig. 2. The leaf tubes deleted by the rakes in Figure 
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If S is an empty tree, then $y(S) = 0. Otherwise, 

$ T (S) = $ T (K(S)) + J! 6 ( P < T )" lo sd + t(P))- 

Pec(s) 

(Note. All logarithmic functions log in this paper are in base 2.) 

Theorem 2.1. For aZZ positive integers n and all n-leaf homeomorphic trees T, 
$ T (T) < rc(l+logn). 

Proof. For any given n, <5>t{T) is maximized when T is a binary tree formed by 
attaching n leaves to a path of n — 1 vertices. The proof is by induction. 

-Base Case. For n = 1, the theorem trivially holds. 

Now assume n > 2. 

Induction Hypothesis. For every positive integer n' < n, the theorem holds. 

Induction Step. Let r be the smallest integer such that T is empty after r rakes. 
Then, at the end of the (r — l)-th rake, T is a path P = xi, ■ ■ ■ , a; p . Let Ti, • • • , T s be 
the subtrees of T rooted at vertices in K(P, T). Let n, be the number of leaves in Tj. 
Note that 

S 

$r(T) = nlog(p+l)+^$ T< (T i ). 
Since 1 < < n and Ti is homeomorphic, by the induction hypothesis, 

s 

$t{T) <nlog(p + l)+^ni(l + logni)- 

i=l 

Since X)i=i = n > 

(1) $ T (T) <n + nlog(p+l)+^nilogni. 

i=i 

Because T is homeomorphic, each Xi has at least one child in K(P,T). Since n > 2, 
r > 2. Then, cc p cannot be a leaf in T and thus has at least two children in K(P, T). 
Consequently, s > p + 1. Next, note that for all mi, m2 > 0, 

mi log mi + m 2 logm 2 < (mi + m 2 ) log(mi + m 2 ). 

With this inequality and the fact that s > p + 1, we can combine the terms in the 
right-hand side summation of Inequality |l| to obtain the following inequality. 

P+i 

(2) $ T (T)<n + nlog(p+l)+5^nJlogn{, 

i=l 

where X)f=i n i — n an d ^ _t 1- For any given p, the summation in Inequality || is 
maximized when n' 1 = n — p and n' 2 = • • • = ^p+i = 1- Therefore, 

(3) $t(T) < n + n\og(p + 1) + (n - p) \og{n - p). 

The right-hand side of Inequality |^ is maximized when p — n — 1. This gives the 
desired bound and finishes the induction proof. □ 
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3. Comparing evolutionary trees. Formally, an evolutionary tree is a home- 
omorphic tree whose leaves are labeled by distinct labels. The label set of an evolu- 
tionary tree is the set of all the leaf labels of that tree. 

The homeomorphic version T' of a tree T is the homeomorphic tree constructed 
from T as follows. Let W = {w | w is a leaf of T or is the lowest common ancestor of 
two leaves}. T" is the tree over W that preserves the ancestor-descendant relationship 
of T. Let T\ and T 2 be two evolutionary trees with label sets L\ and L 2 , respectively. 

• For a subset L[ of Li, Ti\\L[ denotes the homeomorphic version of the tree 
constructed by deleting from Ti all the leaves with labels outside L[. 

• LetTi||T2 = 7i||(LinL 2 ). 

• For a tree path P of T\, P\\T 2 denotes the tree path in 1 1 T 2 formed by the 
vertices of P that remain in Ti||T 2 . 

• For a set V of tree paths P\, - ■■ , P m of Ti, V\\T 2 denotes the set of all -Pj||T 2 . 
Formally, if V is a maximum cardinality subset of TinT 2 such that there exists 

a label- preserving tree isomorphism between Ti||L' and T 2 ||L', then Ti||L' and T2 j |i' 
are called maximum agreement subtrees of T\ and T 2 . 

• Rr(Ti, T 2 ) denotes the number of leaves in a maximum agreement subtree of 
Ti and T 2 . 

• RA(Ti,T 2 ) is the mapping from each vertex v € T 2 \\Ti to rr(Ti, (T 2 \\Ti) v ), 
i.e., ka{T u T 2 ){v) =BR(T 1 ,{T 2 \\T 1 ) V ). 

For a tree path Q of T 2 , if Q is nonempty, let H(Q,T 2 ) be the set of all vertices in 
Q and those in K(Q,T 2 ). If Q is empty, let H(Q,T 2 ) consist of the root of T 2 , and 
thus, if both T 2 and Q are empty, H(Q, T 2 ) = 0. 

• For a set Q of tree paths Qi, ■ • • , Q m of T 2 , let RP(Ti, T 2 , Q) be the mapping 
from v e U^HiQiWT^T^) to rr(T u (T^T^), i.e., rp(Ti, T 2 , Q)(v) = 
Rr(Ti, (T 2 \\Ti) v ). For simplicity, when Q consists of only one path Q, let 
Rp(Ti,T 2 ,Q) denote Rp(Ti,T 2 , Q). 

(The notations RR, RA and rp abbreviate the phrases root to root, root to all and 
root to path. We use RR to replace the notation mast of previous work [[uj |l5|, |4?j 
for the sake of notational uniformity.) 

Lemma 3.1. Let T\,T 2l T^, be evolutionary trees. 

. (^11^)1^3=^11(^1^3). 

• //T3 is a subtree ofT\, then T3HT1 = T1HT3 = T3. 

• RR(T X ,T 2 ) = rr(Ti||T 2 ,T 2 ) = rr(Ti,T 2 ||Ti) = RR(7i| |T 2 , T 2 | |Tx). 
Proof. Straightforward. □ 

Fact 1 ([14]). Given an n-leaf evolutionary tree T and k disjoint sets L\, ■ ■ ■ ,Lk 
of leaf labels of T , the subtrees T\\Li, ■■■ ,T\\Lk can be computed in 0(n) time. 

Proof. The ideas are to preprocess T for answering queries of lowest common 
ancestors pM ^5) and to reconstruct subtrees from appropriate tree traversal num- 
berings [|,]9|. □ 

Given Ti and T 2 , our main goal is to evaluate rr(Ti,T 2 ) efficiently. Note that 
RR(Ti,T 2 ) = rr(Ti||T 2 ,T 2 ||Ti) and that Ti||T 2 and T 2 ||Ti can be computed in linear 
time. Thus, the remaining discussion assumes that T\ and T 2 have the same label 
set. To evaluate RR(Ti, T 2 ), we actually compute RA(T 2 , Ti) and divide the discussion 
among the five problems defined below. Each problem is named as a p-q case, where 
p and q are the numbers of tree paths in Ti and T 2 contained in the input. The inputs 
of these problems are illustrated in Figure ^. 

Problem 1 (one-one case). 
Input: 
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1. Ti and T 2 ; 

2. root paths P of T\ and Q of T 2 with no leaves from their respective trees; 

3. Rp(T", T 2 , Q) for all u G K(P, T±); 

4. rp(T 2 u ,Ti,P) for all v G K{Q,T 2 ). 
Output: RP(Ti, T 2 ,Q) and RP(T 2 ,Ti,P). 

The next problem generalizes Problem p]. 
Problem 2 (many-one case). 
Input: 

1. T\ and T 2 ; 

2. a tube system V = {Pi, • • • , P m } of Ti and a root path Q of T 2 with no 
leaf from T 2 ; 

3. KP(T^T 2 ,Q) for all P and u G ^(P^Tx); 

4. rpCT^T^P) for all « G K(Q,T 2 ). 
Output: 

1. RpfTj' 1 , T 2 , Q) for the head h t of each P; 

2. rp(T 2 ,Ti,P). 
Problem 3 (zero-one case). 

Input: 

1. T\ and T 2 ; 

2. a root path Q of T 2 with no leaf from T 2 ; 

3. RA(T^,T X ) for all v G K{Q,T 2 ). 
Output: ra(T 2 ,Pi). 

The next problem generalizes Problem |[ 
Problem 4 (zero-many case). 
Input: 

1. Tj and T 2 ; 

2. a tube system Q = {Q%, • • • , Q m } of T 2 ; 

3. RA(T|,T!) for all Q 4 and u G K(Q t ,T 2 ). 
Output: RA(T 2 ' l %ri) for the head hi of each Qj. 

Our main goal is to evaluate RR(Ti, T 2 ). It suffices to solve the next problem. 

Problem 5 (zero-zero case). 
Input: Ti and T 2 . 
Output: ra(T 2 ,Pi). 

Our algorithms for these problems are called One- One, Many- One, Zero- One, 
Zero-Many and Zero-Zero, respectively. Each algorithm except One-One uses the 
preceding one in this list as a subroutine. These reductions are based on the rake 
operation defined in We give One-One in §|| and the other four in § p~l| - |4.4| . 

These five algorithms assume that the input trees Ti and T 2 have n leaves each 
and d is the maximum degree. We use integer sort and radix sort || extensively to 
help achieve the desired time complexity. (For brevity, from here onwards, radix sort 
refers to both integer and radix sorts.) For this reason, we make the following integer 
indexing assumptions: 

• An integer array of size 0(n) is allocated to each algorithm. 

• The vertices of T\ and T 2 are indexed by integers from [1, 0{n)\. 

• The leaf labels are indexed by integers from [1, 0(n)}. 

We call Zero-Zero only once to compare two given trees. Consequently, we may 
reasonably assume that the tree vertices are indexed with integers from \l,0(n)]. 
When we call Zero-Zero, we simply allocate an array of size 0(n). As for indexing the 
leaf labels, this paper considers only evolutionary trees whose leaf labels are drawn 
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from a total order. Before we call Zero-Zero, we can sort the leaf labels and index 
them with integers from [1, 0(n)]. This preprocessing takes 0(n\ogn) time, which is 
well within our desired time complexity for Zero-Zero. 

The other four algorithms are called more than once, and their integer indexing 
assumptions are maintained in slightly different situations from that for Zero-Zero. 
When an algorithm issues subroutine calls, it is responsible for maintaining the index- 
ing assumptions for the callees. In certain cases, the caller uses radix sort to reindex 
the labels and the vertices of each callee's input trees. The caller also partitions 
its array into segments and allocates to each callee a segment in proportion to that 
callee's input size. The new indices and the array segments for subroutine calls can 
be computed in obvious manners within the desired time complexity of each caller. 
For brevity of presentation, such preprocessing steps are omitted in the descriptions 
of the five algorithms. 

Some inputs to the algorithms are mappings. We represent a mapping / by 
the set of all pairs (x,f(x)). With this representation, the total size of the input 
mappings in an algorithm is 0{n). Since the input mappings have integer values at 
most n, this representation and the integer indexing assumptions together enable us to 
evaluate the input mappings at many points in a batch by means of radix sort. Other 
mappings that are produced within the algorithms are similarly evaluated. When 
these algorithms are detailed, it becomes evident that such evaluations can computed 
in straightforward manners in time linear in n and the number of points evaluated. 
The descriptions of these algorithms assume that the values of mappings are accessed 
by radix sort. 

4. The rake-based reductions. For ease of understanding, our solutions to 
Problems [l]-|5] are presented in a different order from their logical one. This section 
assumes the following theorem for Problem |l| and uses it to solve Problems ^-||. 
In §|]^, we prove this theorem by giving an algorithm, called One- One, that solves 
Problem |l| within the theorem's stated time bounds. 

Theorem 4.1. Problem^ can be solved in 0{nd 2 log d + n\og(p + 1) log (9 + 1)) 
time or alternatively in 0{ndVd\ogn + nlog(p -I- 1) log(g + 1)) time. 

Proof. Follows from Theorem 5.14 at the end of § |5.6| . □ 



4.1. The many-one case. The following algorithm is for Problem || and uses 
One-One as a subroutine. Note that Problem || is merely a multi-path version of 
Problem |. 

Algorithm Many-One; 
begin 

1. For all P;, compute T M = T x *, T 2A = T 2 \\T U , and = Q||T M ; 

2. For all empty Qi, compute part of the output as follows: 

(a) Compute the root v of T 2A and v £ K(Q, T 2 ) such that v £ T%\ 

(b) KP{Tt,T 2 ,Q){v) «- rp^.Ti.P)^); {Note. H(Q h T 2A ) = {v}. This 
is part of the output.) 

(c) For all x £ H{P U T X ), Rp(T 2 , T u V){x) <- kp{T%,T x ,V){x)\ {Note. This 
is part of the output.) 

3. For all nonempty Qi, compute the remaining output as follows: {Note. The 
many-one case is reduced to the one-one case with input T^j, T 2A , Pi and Qi.) 

(a) For all u £ K{P t , T 1A ), RP(2ft, T 2A , Q t ) «- Rp(Tf , T 2) Q); ' 

(b) For all v £ K{Qi,T 2A ), compute Rp(T| 4 , T m , Pi) as follows: 

i. Compute the vertex v £ K{Q,T 2 ) such that v £ T 2 U ; 
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ii. RP(T 2 ° ii ,r 1)i> P j )(!c) «- MT^TuT^x) for all x G H(Pi,T hi ); 

(c) Compute RP^^, T 2) i, Qj) and RP(T 2>i , T l i7 Pj) by applying One-One to 
T\.i, T2.1, Pi, Qi and the mappings computed at Steps |3a] and [3tJ 

(d) KP(T^,T 2 ,Q) <- KP{T u ,T 2 ,i,Qi); (Note. This is part of the output.) 

(e) For all x € H(P h Ti,;),' Rp(T 2 , Ti, V)(x) <- RP(T 2li , T M , Pi)(x); (Note. 
This is part of the output.) 

end. 

Theorem 4.2. Many-One solves Problem^ with the following time complexities: 

m 

0(nd 2 \ogd + log(l + t(Q))- b{Pi, T x )\og{\ + t(Pi))), 

i=l 

or alternatively 

rn 

0(ndVd\ogn + log(l + t(Q))-J2 &(P, Ti) log(l + i(P;))). 



Proof. Since Ti and T 2 have the same label set, all T 2 ^ are nonempty. To compute 
the output RP, there are two cases depending on whether Qi is empty or nonempty. 
These cases are computed by Steps p| a n d p| . The correctness of Many-One is then 
determined by that of Steps ^t], [2c], |3a|, |3b| , 3(b)ii[ pcj and ^3. These steps can be 
verified using Lemma |3.1| . As for the time complexity, these steps take 0(n) time 
using radix sort to evaluate RP. Step |l| uses Fact |l| and takes 0(n) time. Steps 
and p(b)i take 0(n) time using tree traversal and radix sort. As discussed in <Jl 
Step |3c| preprocesses the input of its One-One calls to maintain their integer indexing 
assumptions. We reindex the labels and vertices of and T 2ji and pass the new 
indices to the calls. We also partition Many-One's 0(n)-size array to allocate a 
segment of size \T\^\ to the call with input Ti 4. Since the total input size of the 
calls is 0(n), this preprocessing takes OJn) time in an obvious manner. After this 
preprocessing, the running time of Step |3c| dominates that of Many-One. The stated 
time bounds follow from Theorem 4.1 and the fact that Qi is not longer than Q and 
the degrees of T 2 .i are at most d. □ 

4.2. The zero-one case. The following algorithm is for Problem ||. It uses 
Many-One as a subroutine to recursively compare T 2 with the subtrees of Ti rooted 
at the heads of the tubes obtained by iteratively raking T\. The tubes obtained by 
the first rake are compared with T 2 first, and the tube obtained by the last rake is 
compared last. 
Algorithm Zero-One; 
begin 

1. S^T X ; 

2. LF <- C(S); (Note. LF consists of the leaves of Ti.) 

3. For all x G LF, RA(T 2 ,Ti)(x) <- 1; (Note. This is part of the output.) 

4. For all u G LF, RP(T", T 2 ,Q)(y) «- 1, where y is the unique vertex of T 2 ||Tf ; 
(Note. This is the base case of rake-based recursion.) 

5. S *-S-£(S); 

6. while S is not empty do the following steps: 

(a) Compute C(S) = {Pi, • • • , P m }; 

(b) Gather the mappings Rp(T 1 u , T 2 , Q) for all P and u G K(P h Ti); (Note, 
These mappings are either initialized at Step ^ or computed at previous 
iterations of Step 6d.) 
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(c) RP(T^,T 1: C{S)){x) <- RA(T^,T 1 )(a;) for all v G K(Q,T 2 ) and x G 

(d) Compute Rp(T 1 ' il ,T 2 ,Q) for the head of each P 4 and Rp(T 2 , Ti, £(5)) 
by applying Many-One to Ti, T2, C(S), Q and the mappings obtained 
at Steps |l] and [kj; (Note. This is the recursion step of rake-based 
recursion.) 

(e) For all x G U|l 1 i<'(P i , Ti), RA(T 2 , Ti)(ar) 4- RP(T 2 , Ti, £(S'))(x); (JVoie. 
This is part of the output.) 

(f) S^S-C(S); 

end. 

Theorem 4.3. Zero-One solves Problem ^ with the following time complexities: 
0(nd 2 log d log n + nlognlog(l + t(Q))), 

or alternatively 

0(ndVd\og 2 n + nlognlog(l + t(Q))). 



Proof. The C(S) at Step ^ is a tube system. The heads of the tubes in C(S) 
become children of the tubes in future £(5). The vertices u G K(Pi,T{) at Step ^ are 
either leaves of T\ or heads of the tubes in previous C(S). These properties ensure 
the cor rect ness of the rake-based recursion. The remaining correctness proof uses 
Lemma 34 to verify the correctness of Steps 3, ft |6c| and [kj. Steps |l[]|, |(3a|, 5b and ^| 
are straightforward and take 0(n) time. Step |6g and ^3e] take 0(n) time using radix 
sort to access RP and RA. At Step ^k], to maintain the integer indexing assumptions 
for the call to Many-One, we simply pass to Many-One the indices of Ti and T 2 and 
the whole array of Zero-One. Step 6d has the same time complexity as Zero-One. 
The desired time bounds follow from Theorems 2.1 and Theorem 4.2. □ 



4.3. The zero-many case. The following algorithm is for Problem^ and uses 
Zero-One as a subroutine. Note that Problem |J is merely a multi-path version of 
Problem ||. 

Algorithm Zero-Many; 
begin 

1. For all Qi, compute T 2 < = T 2 



For all Qi and v G K(Qi,T 2 ,i) 
For all Qi, compute RA(T2^, 
the mapping computed at Step 
For all Q h ra(T 2 '' ! , Ti) <- RA(T : 



and T M = Xi||T 2i i; 

RA^Ti^^RA^.Ti); 

by applying Zero-One to T\^,T 2 ^ 



Qi and 



,Ti,i); (Note. This is the output. 



end. 



Theorem 4.4. Zero-Many solves Problem with the following time complexities: 



0(nd 2 logdlogn + logn-^6(Q i ,T 2 )log(l + t(Q i ))), 



or alternatively 



0(ndVd\og 2 n + logn- ^ 6(Q<, T 2 ) log(l + t(Qi))). 



i=l 



Proo f. T he proof is similar to that of Theorem 4.2. The time bounds follow from 
Theorem O. □ 
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4.4. The zero-zero case. The following algorithm is for Problem ||[ It uses 
Zero-Many as a subroutine to recursively compare T\ with the subtrees of T 2 rooted 
at the heads of the tubes obtained by iteratively raking T 2 . The tubes obtained by 
the first rake are compared with T\ first, and the tube obtained by the last rake is 
compared last. 
Algorithm Zero-Zero; 
begin 

1. S <- T 2 ; 

2. LF <- C(S); {Note. LF consists of the leaves of T 2 .) 

3. For all v € LF, ra(I%,Ti)(x) <- 1, where z is the only vertex in Ti||T 2 "; 
(Note. This is the base case of rake-based recursion.) 

4. £«- S-C(S); 

5. while if? is not empty do 

(a) Compute C(S) = {Qi, ■ ■ ■ , Q m }; 

(b) Gather the mappings RA(T£,Ti) for all Q; and u S K(Qi,T 2 ); (Note. 
These mappings are either initialized at Step ^ or computed at previous 
iterations of Step |5c| .) 

(c) Compute RA(T 2 \ Ti) for the head hi of each Qi by applying Zero-Many 
to Xi,T 2 , C(S) and the mappings obtained at Step |5b[ (Note. This is 
the recursion step of rake-based recursion.) 

(d) S^S-C(S); 

6. RA(T 2 , Ti) <- RA(T£, Ti), where h is the root of T 2 ; (AToie. This is the output. 
If T 2 has only one vertex, RA(T 2 ,Ti) is computed at Step ||; otherwise it is 
computed at the last iteration of Step ^cj.) 

end. 

Theorem 4.5. Zero-Zero solves Problem || within 0(nd 2 log dlog 2 n) time or 
alternatively within 0(ndVd\og 3 n) time. 

Proof. The proof is similar to that of Theorem [4.3|. The time bounds follow from 



Theorems 2.1 and H- 4- □ 



5. The one-one case. Our algorithm for Problem [j] makes extensive use of 
bisection-based dynamic programming and implicit computation in compressed for- 
mats. This problem generalizes the longest common subsequence problem |2^, |2^, 
|30| , [32| , which has efficient dynamic programming solutions. A direct dynamic pro- 
gramming approach to our problem would recursively solve the problem with Tf and 
T 2 in place of T\ and T 2 for all vertices x £ P and y e Q. This approach may 
require solving fl(n 2 ) subproblems. To improve the time complexity, observe that the 
number of leaves in a maximum agreement subtree of Tf and T 2 can range only from 
to n. Moreover, this number never increases when x moves from the root of T\ 
along P to P's endpoint, and y remains fixed, or vice versa. Compared to the length 
of P, RR(T^ ,T 2 ) often assumes relatively few different values. Thus, to compute 
this number along P, it is useful to compute the locations at P where the number 
decreases. We can find those locations with a bisection scheme and use them to im- 
plicitly solve the 0(n 2 ) subproblems in certain compr esse d formats. We first describe 
basic techniques used in such implicit computation in §5^ a nd t hen proceed to discuss 
bisection-based dynamic programming techniques in §5.2 -§5.5. We combine all these 
techniques to give an algorithm to solve Problem [l] in §5.6 



5.1. Condensed sequences. For integers k\ and k 2 with k\ < k 2 , let [fci,/c 2 ] = 
{fci, ■ • ■ , fc 2 }, i.e., the integer interval between ki and fc 2 . The length of an integer 
interval is the number of its integers. The upper and lower halves of an even length 
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[fci,fc 2 ] are [fci, 1+ 2 ~ 1 ] and [ fcl+ ^ 2+1 , fc 2 ], respectively. The regular integer intervals 
are defined recursively. For all integers a > 0, [1, 2"] is regular. The upper and lower 
halves of an even length regular interval are also regular. 

For example, [1,8] is regular. Its regular subintervals are [1,4], [5,8], [1,2], [3,4], 
[5,6], [7,8], and the singletons [1, 1], [2, 2], . . . , [8, 8]. 

A normal sequence is a nonincreasing sequence {f(j)}j=i of nonnegative numbers. 
A normal sequence is nontrivial if it has at least one nonzero term. 

For example, 5,4,4,0 is a nontrivial normal sequence, whereas 0,0,0 is a trivial 

one. 

Let /i, • • • , fk be k normal sequences of length I. An interval query for /i, • • • , ff. 
is a pair ([fci, fc 2 ], j) where [k\ 1 k 2 ] C [1, A;] and j G [1, 1]. If k\ = fc 2 , ([£q, fc 2 ], j) is also 
called a point query. The m/we of a query ([fci,fc 2 ], j) is maxj. 1 < i <fe 2 A query 

([ki, k 2 ],j) is regular if [fc 1; fc 2 ] is a regular integer interval. 

For example, let 

h = 5,4,4,3,2; 
h = 8,7,4,2,0; 
h = 9,9,5,0,0. 

Then, /i, / 2 and / 3 are normal sequences of length 5. Here, k = 3 and I = 5. Thus, 
([1,3], 2) is an interval query; its value is max{,/i(2), / 2 (2), / 3 (2)} = 9. The pair 
([1, 1], 3) is a point query; its value is /i(3) = 4. The pair ([1, 2], 2) is a regular query; 
its values is max{/i(2), / 2 (2)} = 7. 

The joint of /i, • • • , fk is the normal sequence / also of length I such that f(j) = 
max{/i(j),---,/ fc (j)}. 

Continuing the above example, the joint of /i, / 2 , / 3 is 

/ = 9,9,5,3,2. 

The minimal condensed form of a normal sequence {/0')}j=i is the set of all pairs 
(j,f(j)) where f(j) ^ and j is the largest index of any f(j') with f(j') = /(.?)• A 
condensed form is a set of pairs (j, f(j)) that includes the minimal condensed form. 
The size of a condensed form is the number of pairs in it. The total size of a collection 
of condensed forms is the sum of the sizes of those forms. 

Continuing the above example, the minimal condensed form of / 3 is {(2, 9), (3, 5)}; 
its size is 2. The set {(1, 9), (2, 9), (3, 5), (5, 0)} is a condensed form of / 3 ; its size is 4. 
The total size of these two forms is 6. 

Lemma 5.1. Let iq, • • • be sets of nontrivial normal sequences of length I. 
Let fi be the joint of the sequences in Fi. Given a condensed form of each sequence 
in each F it we can compute the minimal condensed forms of all fi in 0(1 + s) time 
where s is the total size of the input forms. 

Proof. The desired minimal forms can be computed by the two steps below: 

1. Sort the pairs in the given condensed forms for Fi into a sequence in the 
increasing order of the first components of these pairs. 

2. Go through this sequence to delete all unnecessary pairs to obtain the minimal 
condensed form of fi. 

We can use radix sort to implement Step 1 in 0(1 + s) time for all Fj. Step 2 can be 
easily implemented in O(s) time for all Fj. □ 

Lemma 5.2. Let fi, ■ ■ ■ , fk be nontrivial normal sequences of length I. Assume 
that the input consists of a condensed form of each fi with a total size of s. 
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1. We can evaluate m point queries in 0(m + I + s) time. 

2. We can evaluate mi regular queries and m 2 irregular queries in a total of 
0(mi + (m 2 + I + s) log(fe + 1)) time. 

Proof. The proof of Statement 1 uses radix sort in an obvious manner. To prove 
Statement 2, we assume without loss of generality that A: is a power of two. The input 
queries can be evaluated by the following three stages within the desired time bound. 

Stage 1. For each regular interval [fcx,fe] Q [1,&], let f[ki,k 2 ] be the joint of 
/fen • ' " i fk 2 - We use Lemma jsTT|0(log(fc+l)) times to compute the minimal condensed 
forms of all f[ki, k 2 \. The total size of these forms is 0(s log(fc + 1)). This stage takes 
0((l + s)log(fc + 1)) time. 

Stage 2. For each irregular input query ([ii, «a]) we partition [«xj*2] into 
(9(log(fc + 1)) regular subintervals [hi, h 2 ], [h 2 + 1, ^3], ■ ■ ■ , [h r —i + 1, h r ]. Then, the 
value of ([ix, i%[,3) is the maximum of those of ([hi, h 2 ],j), ■ ■ ■ , ([/i r _x+l, h r ],j). These 
regular queries are point queries for f[hi, h 2 ], ■ • • , f[h r -i + 1, h r ]. Together with the 
given mi regular queries, we have now generated 0(mi + m 2 log(/c + 1)) point queries 
for all f[ki.k 2 ]. This stage takes 0(m\ + m 2 log(fc + 1)) time. 

Stage 3. We use Statement 1 and the minimal condensed forms of f[ki.k 2 ] to 
evaluate the points queries generated at Stage 2. Once the values of these point 
queries are obtained, we can easily compute the values of the input queries. This 
stage takes 0(m\ + m 2 log(fe + 1) + I + s log(fc + 1)) time. □ 

5.2. Normalizing the input. To solve Problem |l], we first augment its input 
Ti,T 2 ,P and Q in order to simplify our discussion. Let P — xi, ■ ■ ■ ,x p and Q = 
yi, ■ ■ ■ ,y q . Without loss of generality, we assume that p> q. 

1. Let a and (3 be the smallest positive integers such that p' = 2" + l, q' — 2 /3 + l, 
p' > p' > P and 4 > 1- (Note. The conditions p' > p and q' > q are 
employed for technical simplicity. They can be changed to p 1 > p and q' > q 
with some modification on Algorithm One-One.) 

2. Attach to x p the path x p +i, • ■ ■ , x p > and to y q the path • • • , y q >. 

3. Let P' = xi, ■ ■ ■ ,Xp> and Q' = yi, ■ ■ ■ , y q i. 

4. Attach a leaf to each of x p +i, • • • , x P '-i and y q +i, • • • , y q >—i, two leaves to x p i, 
and two leaves to y q > . 

5. Assign distinct labels to the new leaves which also differ from the existing 
labels of Tx and T 2 . 

6. Let Si be T\ together with P' and the new leaves of P' . Let S 2 be T 2 together 
with Q' and the new leaves of Q'. 

Si and ^2 are evolutionary trees. P' and Q' contain no leaves from Si and S 2 , 
and are root paths of these trees. Let n' = maxjrix,^} where rii is the number of 
leaves in Si. Let d! be the maximum degree in 5*x and S 2 . 

Lemma 5.3. 

• n ' = 0(n), p' = 0(p), q' = 0(q), and d' < d+ 1. 

• rp(Ti,T 2 ,Q) = kp(Si,S 2 ,Q') andRp(T 2 ,Ti,P) = KP(S 2 ,Si,P'). 
Proof. Straightforward. □ 

In light of Lemma |5.3| , our discussion below mainly works with Si, S 2 , P' and Q' . 
Let G = Gp U Gq where Gp is the set of all pairs (xt , yi ) and Gq is the set of all 
(xi,yj). To solve Problem |l|, a main task is to evaluate RR(Sf,S 2 ) for (x,y) e G. 
The output rp values that are excluded here can be retrieved directly from the input 
rp mappings. 

5.3. Predecessors. A pair (xi>, yj>) is a predecessor of a distinct (x^ , yj ) if i < i' 
and j < j'. One-One proceeds by recursively reducing the problem of computing 
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RR(5f,S < 2) to that of computing the RR values of the P -predecessor, Q -predecessor 
and PQ -predecessor defined below. 

Let P[i, i'] be the path Xi, ■ ■ ■ , xy , where i < i'. Let Xi be the set of the children 
of Xi in S\ that are not in P' . We similarly define Q\j,j'] and Yj. A pair {xi,yj) is 
intersecting if S 1 " and S% have at least one common leaf label for some u G Xi and 
v € Yj. (P[i,i'],Q[yj,yji]) is intersecting if some xy G P[i,i'] and yi» G Q\j,j'] form 
an intersecting pair. 

The lengths of P[i,i') and Q{j,j'] are those of and respectively. A 

path P[i,i'] is regular if [i, i'] is a regular interval. A regular Q[j,j'] is similarly de- 
fined. We now construct a tree ^ over pairs of regular paths; this tree is slightly 
different from that of jL§. The root of * is (P[l,p' - 1],Q[W - 1]). A pair 
(P[i,i'],Q\j,j']) € \& is a leaf if and only if either (1) i = i', j = j' and (xi,yj) 
is intersecting, or (2) this pair is nonintersecting. For a nonleaf (P[i,i'),Q[j,j']) S 

if j = f, then its children are (P[i, i± ^= 1 ],yj) and (P[s±^±l,i'], % -). Other- 
wise, this pair has four children (P[i, Q[j, td ^ ± ]), {P[i, i±j f^],Q[ i±j 2 - ±1 ,f}), 
(p[i±^j'],Q\j,^^]),(P[i±^,i%Q[2±l±l,j']). 

The ceiling of (P[i,i'],Q[j, j'}) is (xi,yj): its /Zoor is (2^+1,^+1) Its P- 

diagonal is (xi'+i, yj); its Q-diagonal is (x,, Let E be the set of all ceilings, 

diagonals, floors of the leaves of 'J. Let B — {(xi,y q >) | i € [l,p']} U {(ay,$/j) | j € 
[1, <?']}■ Due to its recursive nature, One-One evaluates RR(,f?J,Sf) for all (x,y) G 
GUEUB. 

Given (xi,yj), if (Xj+i, ^ GU£UB, then this pair is the PQ -predecessor of 
(xi, yj). Let i' be the smallest index that is larger than i such that (Xi> , yj) G GUEUB. 
This (xi> , j/j) is the P -predecessor of (xi,yj). Let j' be the smallest index larger than 
j such that {xi, yj/) G G U -E U B. This (xj, is the Q-predecessor of (xi, yj). 

Lemma 5.4. 

1. Each intersecting (xi,yj) G (G U E) — B has a P -predecessor (x^+i , z/j), a 
Q-predecessor (xi,yj+i) and a PQ -predecessor (x^+i, j/j+i). 

2. Each nonintersecting (xi,yj) G E — B has a P -predecessor (xi',yj) and a 
Q-predecessor (xi,yj>). Also, {P[i,i' — l],Q\j,j' — 1]) is nonintersecting. 

3. Each nonintersecting (xi,yi) G Gp — B has a P -predecessor [xi+\,y\) and a 
Q-predecessor (xi,yj). Moreover, (xj, Q[l,j — 1]) is nonintersecting. 

4- Each nonintersecting (x±,yj) G Gq — B has a P -predecessor (xi,yj) and a 
Q-predecessor {x\,yj+i). Moreover, (P[l, i — 1], yj) is nonintersecting. 

Proof. Statement 1 follows from the definitions of if? and E. The proofs of 
Statements 3 and 4 are similar to Case 3 in the proof of Statement 2 below. 

As for Statement 2, by the definition of B, Xi> and yji exist. To show (P[i,i' — 
l],Q\j,j' — 1]) is nonintersecting, we consider the following four cases. The proofs of 
their symmetric cases are similar to theirs and are omitted for brevity. 

Case 1: (xi,yj) is the ceiling of a nonintersecting leaf (P[i, 12], Q[j, J2]) G ^. 
Since (xi, yj 2 +i) and (x; 2 +i, yj) are in E, i' < 12 + 1 and j' < j% + 1. Then because 
{P[i,i2],Q[j,h}) is nonintersecting, so is (P[i,i' - l],Q\j,f - 1])- 

Case 2: (xi,yj) is the Q-diagonal of a nonintersecting leaf (P[i, is], Q[ji, j~ 1]) (or 
symmetrically, (xi,yj) is the P-diagonal of a nonintersecting leaf (P[ii, i— 1], Q[j, 32]))- 
Since {xi 2+ \, yj) is the floor of {P[i, 12], Q\ji, j — 1]), {xi 2 +i,yj) G Sand thus i' < *2+l- 
Let j" be the smallest index such that j < j" and (P[i, 12], yj") is intersecting. There 
are two subcases. 

Case 2a: j" does not exist. Then, (P[i, 12], Q[j, <?']) is nonintersecting and there- 
fore (P[i,i' — 1]>Q\j>j' — 1]) is nonintersecting. 
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Case 2b. j" exists. Let Q[3 3 ,3a\ be a regular path that contains yy and is of the 
same length as Q[ji,j — 1]. Note that j < j 3 and (P[i, 12], Q[j 3 , 3a\) S There are 
two subcases. 

Case 2b(l): j 3 = j. Then (x i; is the ceiling of (P[i, i 2 ], Q\ji,ji])- Since (x;,%) 
is nonintersecting, it is the ceiling of a nonintersecting leaf in which is a descendant 
of (P[«,«2], Q[j3, J4])- Therefore, Case 2b(l) is reduced to Case 1. 

Case 2b(2): j 3 > j. By the construction of 'J, (xi,yj 3 ) £ E and thus j' < j 3 . 
By the choice of Q\j 3 ,3i[, {P[i, 12], Q\j,33 — 1]) is nonintersecting and so is (P[i,i' — 

l],Q[j,f-l}). 

Case 3: (xt,yj) is the Q-diagonal of an intersecting leaf (xi,yj—i) (or symmetri- 
cally, (xj, yj) is the P-diagonal of an intersecting leaf (xj_i, 2/3)). Since (xj+i, y^) G P, 
i' = i + 1 and P[i, i' — 1] = x,. Let 3" be the smallest index such that j < j" and 
(xi,yj") is intersecting. There are two subcases. 

Case 3a: j" does not exist. Then, (xi,Q\j,q']) is nonintersecting and therefore 
(P[i,i' — — 1]) is nonintersecting. 

Case 3b: j" exists. Then, (xi,yj») £ E and j' < j". By the choice of j" , 
(xi,Q[j,j" — 1]) is nonintersecting. Thus, (P[i, z' — 1], Q[j,j' — 1]) is nonintersecting. 

Case 4: (xj, is the floor of a leaf (P[zi, i — 1], Q[?i, J — 1]), which may or may not 
be intersecting. Let (P[i 3 , h], Q[j 3 , J4]) be the lowest ancestor of (P[h, i— 1], j — 
1]) in such that (xi,yj) is not the floor of ^[13,24], Q[j3, j 4 ]). This ancestor exists 
because (xi,yj) g" P. There are two subcases. 



Case 4a: j 3 = j'4 and 13 < 14. Then, P[ii,i — 1] is a subpath of P[i 3 



»3+U- 

2 



and i = m+m+I^ Also, J3 = j\ = 3 — 1. Thus, (xi,yj) is the Q-diagonal of 
(P[i, 44], € ^- By the construction of 'J, (xi,yj) is the Q-diagonal of a leaf 
which is either (P[z, 14], yj—i) itself or its descendant. Depending on whether this leaf 
is nonintersecting or intersecting, Case 4a is reduced to Case 2 or 3. 
Case 4b: j 3 < j'4 and 43 < 14. There are two subcases. 

Case 4b(l): Pfr,i-1] C P[t 3 , <a+, 2 4 ~ 1 ] and QfaJ-1] C Q[j 3 , 33+ ^ 4 " 1 ]. Note that 
■ _ i3+M+i j = A+i4+i ; and ^^.j i s the ceiling of (P[ t3 + 2 4+1 , i 4 ], Q[ J ' 3+ £ +1 , J4]) € 



^l/. Since (xi,yj) is nonintersecting, (xi,yj) is the ceiling of a nonintersecting leaf in 
^ which is (p[ a+M+2 ^ z 4 ], Q[ j3+ ^ 4+1 , j 4 ]) itself or a descendant. This reduces Case 
4b(l) to Case 1. 

Case 4b(2): P[n,i-1} C P[i 3 , *a±|t=i ] and Q\ji,j-1] C Q[ j3+3 2 i+1 ,H] (or sym- 
metrically, P[n,i-1] C P[ i3+i 2 i+1 ,h} and Q[?'i,i-1] C Q[j 3 , J ' 3+ ^ 4 " 1 ]). Note that i = 
ia+M+i, j = j 4 + 1, and (x il% -) is the Q-diagonal of (Pp+|i+I, k], Qp±f±±, J 4 ]) € 
Then, (xi,jy) is the Q-diagonal of a leaf which is (P[ t3+ 2 4+1 , u],Q[ h+ { i+1 , j 4 ]) it- 
self or a descendant. Depending on whether this leaf is nonintersecting or intersecting, 
Case 4b(2) is reduced to Case 2 or 3. □ 

5.4. Counting lemmas. We now give some counting lemmas that are used 



in §5^3 to bound One-One's time complexity. 
For all (P[h,i2],Q\juh]) G *, 

• C(P[ii,i2],Q[ji, j 2 }) denotes the set of all ceilings of the leaves in \t which 
are either (P[ii,i 2 ], Q\ji, 32]) itself or its descendants; 

• D(P[ii, 12}, Q\ji, 32]) denotes the set of all Q-diagonals of the leaves in ^ 
which are either (P[ii,i2],Q\ji,32\) itself or its descendants; 

• I(P[h,i2],Q\ji,32]) = {(xi,Vj) I Xi £ P[i 1 ,i 2 \,y j £ Q [31,^2] and (xi,yj) is 
intersecting}. 

Lemma 5.5. 
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1. \I(P[l,p'-l},Q[l,q'-l})\ <n. 

2. has 0(n log (q + 1)) leaves of the form (P[i\, i 2 ], Q[ji, J2]) where ji < j 2 . 

3. W /ias 0(n log(q + 1)) pairs 0/ the form (P[ii,i 2 },yj) where P[ii,i 2 ] is of 
length ^rzr- 

I \E\=0(nlog(p+l)). 

Proof. Statements 1-3 are proved below. The proof of Statmcnt 4 is similar to 
those of Statements 2 and 3. 

Statement 1. For all distinct intersecting pairs (xi,yj) and (x v ,yji), the leaf 
labels shared by the subtrees T" where u £ Xi and the subtrees T 2 where v £ Y^ are 
different from the shared labels for Xy and Yy . Statement 1 then follows from the 
fact that Si and 6*2 share n leaf labels. 

Statements 2 and 3. On each level of 'J', for all distinct pairs (P[ii,i2],Q[ji,j2]) 
and (P[i[,t' 2 },Q[fi,j 2 }), I(P[ii,i2},Q\ji,j2})nI(P[i[,i' 2 ],Q\j' 1 ,j 2 }) = 0. Thus, each 
level has at most \I{P[\,p' — l],Q[l,q' — 1])| nonlcaf pairs. Consequently, from the 
second level downwards, each level has at most 4 • \I(P[l,p' — 1], Q[l, q' — 1])| pairs. 
These two statements then follows from Statement 1 and the fact that the pairs 
specified in these two statements are within the top 1 + log(g' — 1) levels of D 

A pair (xi,yj) is F '-regular if — 1] is a regular interval where (xi',yj) is the 
P-predecessor of (xi, yj). (We do not need the notion of Q-regular because p' > q' .) 

Given a regular [ii, i 2 ], a set {h\, • • • , hk} regularly partitions [ii,i 2 ] if hi = ii and 
the intervals [hi, h 2 — 1], [h 2 , /13 — 1], • • • , [hk-i, hk — 1], [hk, 12] are all regular. 

Lemma 5.6. 

1. Assume that j > 1 and P([ii,i 2 ],yj) £ <J> . If the P-predecessor (xi,yj) of 
some {xi>,yj) £ C(P[ii,i 2 ],y ) is not in {{x i2+ i, yj)} U C(P[ii, i 2 ], yj), then 
P{[h,h\,yj-i) G * and (x^yj) £ D(P[i 1 ,i 2 ],yj-i). 

2. Assume that j < q' and P([ii,i 2 ],yj-i) £ If the P-predecessor (xi,yj) of 
some (xi>,yj) £ D{P[i 1 , i 2 ], yj-i) is not in {(x i2+ i,yj) U D(P[ii, i 2 ], yj-i), 
then P([ii,i 2 ],yj) £ * and (xi,yj) £ C(P[ii,i 2 ],yj). 

3. For every (P[ii,i 2 },yj) £ the set {i \ (xi,yj) £ C(P[ii, i 2 ], yj)} regularly 
partitions [21,^2] and so does the set {i \ (xi,yj) £ D(P[ii,i 2 ],yj)}. 

4- For all (P[ii,i 2 ],yj) £ ^, every pair in C(P[ii,i 2 ],yj) U D(P[ii,i 2 ],yj) is 
P -regular. 

5. At most 0(n\og(q + 1)) of the nonintersecting pairs of E are P -irregular. 

Proof. The proofs of Statements 1 and 5 are detailed below. The proof of State- 
ment 2 is similar to that of Statement 1 and is omitted. Statement 3 is obvious. 
Statement 4 follows from the first three statements and the fact that if two sets 
regularly partition [11,^2], then so does their union. 

Statement 1. Note that ii < i < i 2 and q' > j > 1. The pair (xi,yj) can be the 
ceiling, the P-diagonal, the Q-diagonal, or the floor of some leaf (P[i3, 14], Q[jz,ji\) £ 
. These four cases are discussed below. 

Case 1: (a;,, yj) is the ceiling. Then i — i% and j — j'3. Since ii < i < i 2 and both 
[i,ii] and [21,22] are regular, [1,24] C [21,22]. Since the length of P[2i,2 2 ] is at most 
Y^i, so is the length of P [2, 24]. Thus Q[j%, j'4] = yj and (P[i, 24], yj) is a descendant 
of (P[ii,i 2 ],yj). This contradicts the assumption that (xi,yj) £ C{P[i\,i 2 ],yj) and 
this case cannot exist. 

Case 2: (xi,yj) is the P-diagonal. Then 2 = 24 + 1 and j = j 3 . As in Case 1, 
Q\j3tH] = Vj an( i (-P[*3i ? — 1])%) i s a descendant of (P[ii,i 2 ],yj). Thus, there 
exists a leaf (P[i, ie], yj) that is a descendant of (P[ii,i 2 ],yj). Because (xi,yj) is 
the ceiling of this leaf, the existence of this leaf contradicts the assumption that 
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(xi,yj) C(P[ii,i2\,Uj) and this case cannot exist. 

Case 3: (xi,y 3 ) is the Q-diagonal. Then, i = i 3 and j = ]\ + 1. As in Case 1, 
C [ii,i 2 ] and Q[j 3 ,j4] = J/j-i- Since (P[i, u],%-i) G *, (P[ii, i 2 ], %-i) € 
Then (P[i, i 4 ], y^-i) is a descendant of (P[h, 12], Vj-i) and (x<, j/j) € £>(P[ii,i2],%--i)- 

Cose 4: (xi,yj) is the floor. Then, i = 14 + 1 and j = j'4 + 1. As in Case 3, 
(P[ii, 12], Vj-i) G Q[?3ii ~ 1] = 2/j-i an d (P[«3,»- l],3/j— 1) is a descendant 
of (P[ii, ia], Vj-i)- Thus, there is a leaf (P[«, ie]> 2/j-i) which is a descendant of 
(P[ii, 22], Uj-i)- Since {xi,y 3 ) is this leaf's Q-diagonal, it is in £)(P[«i, 12], Uj-i)- 

Statement 5. Note that E consists of the following three types of pairs: 

1. the ceiling, diagonals and floor of a leaf (P[h, 1-2], Q\ji,32[) where ji < 32. 

2. the P-diagonal and floor of {P[i\, £2], Vj\) £ ^ where P[zi,i2] is of length 
p'-i 

q'-l- 

3. the pairs in C(P[ii,i2],j]) U D(P[ii,i 2 ],yj]) where (P[ii,«2], j]) £ * and 
P [11,12] is of length 2^-. 

By Statement 4, only the pairs of the first t wo t ypes may be P-irregular. This 
statement then follows from Lemmas [5.5| (P) and |5.5| (j3|) . □ 

5.5. Recurrences. One-One uses the following formulas to recursively compute 
RR(5^, S2 1 ) for (Xi,yj) G G U E U B in terms of the RR values of the appropriate 
P-predecessor, Q-predecessor and PQ-predecessor of (xi,yj). 

For vertex subsets U of Si and V of 5*2, m(U, V) denotes the maximum weight of 
any matching of the bipartite graph (U, V, UxV) where the weight of an edge (u, v) 
is RR(5", 5a). Let u(U, v) = m(U, {v}) and u(u, V) = m({u}, V). Given two vertices 
x € Si and y G S2, let m(U, V, x, y) be the maximum weight of any matching of the 
same graph without the edge (x,y). 

Lemma 5.7. For each (xi,yj ) G B, RR(5f,5|) = 0. 

Proof. This lemma follows from the fact that p' > p, q > q and the new labels of 
Si and S2 are different from one another and the labels of T\ and T2 . □ 



Fact 2 ([47]). For all vertices u G Si and v G S2, 



RR{S[ l ,S%) = max 



M(K(u,S 1 ),K(v,S 2 )), 

m(u,K(v,S 2 )), 

u(K(u,Si),v) 



Proof. To form maximum agreement subtrees of 5" and S%, there are three cases. 
(1) m(K(u, Si), K(v, S2)) accounts for matching u to v. (2) m(u, K(v, S2)) accounts 
for matching u to a proper descendant of v. (3) m(K(u, Si), v) accounts for matching 
v to a proper descendant of u. □ 

Lemma 5.8. For all (xi, yj) where i < p' and j < q' , regardless of whether {xi, yj) 
is intersecting or nonintersecting, 



RR(Sf* , S2 1 ) = max < 



f MpQ,Y J -) + RR(S 1 <+1 ,S*' +1 ), 

m(A 4 U {x l+ i}, Y 3 U {yj+i}, x i+ i,y j+ i), 

u(xi,Yj), 
rr(S' 1 ' ,S 2 3 ), 
, M(X it yj) 



> . 



Proof. This lemma follows from Fact g with a finer case analysis for the cases in 
the proof of Fact |[ □ 
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Lemma 5.9. For each nonintersecting (xi,yj) G E—B with P -predecessor (xy , yj) 
and Q -predecessor [xi,yji), 

{maxjv/ e y );? v_ X ] M(xi>,Yj») + max i « e [ <i i/_ 1 ] M(Xj»,yy), 
RR^,^'), 
RR(^',^) 

Proof. This lemma follows from Lemma [^J|(2) and is obtained by iterative appli- 
cations of Lemma 5.8. The following properties are used. Since (P[i, i'— 1], Q[j, f— 1]) 
is nonintersecting, for i" £ [i, z' — 1] and j" e [j,j' — 1], 
. MI.Y.-.V,-) 0: 

• m(AV U {av+i},^-" u {3/jv/ +1 },av/ + i,yj» + i) = M(ii«,>» +M(X i //,y J -//); 

• u(Xi»,Yj») = u(xi> ,Yj»); 

• u(Xi»,yj») = M(Xi„,y f ). 

□ 

For brevity, the symmetric statement of the next lemma for Gq is omitted. 
Lemma 5.10. For all nonintersecting pairs (a;*, y\) G Gp — B with Q-predecessor 
{xi,yj), 

RR^l* , S^), 
RR(Sf 4 , S 2 yi ) = max { RR(5i l+1 , ), 

u(Xi,yj) + ma,x J , e[ld _ 1] u(x i+ i,Yj>) 



Proof. The proof is similar to that of Lemma 5J3 and follows from Lemma p.4| (3). 

□ 



5.6. The algorithm for Problem gj. We combine the discussion in § |5.3| -§p.5j 

to give the following algorithm to solve Problem [j]. 

Algorithm One-One; 

begin 

1. Compute Si, S 2 , P', Q', RP(S%, S 2 , Q') for u G K(P', Si), and Rp(S%,Si,P') 
veK(Q',S 2 ); 

2. Compute G U E U B, B, I(P[l,p' - 1],Q[W - 1]) - B, the set of all non- 
intersecting pairs in E — B, and the sets of nonintersecting pairs in Gp — B 
and Gq — B, respectively; 

3. Compute the following predecessors: 

• the P-predecessor, Q-predecessor and PQ-predecessor of each pair in 
I{P[l,p'-l],Q[l,q'-l])-B; 

• the P-predecessor and Q-predecessor of each nonintersecting pair in E — 
B; 

• the Q-predecessor of each nonintersecting pair in Gp — B and the P- 
predecessor of each nonintersecting pair in Gq — B: 

4. For all pairs in 67 U E U B, compute the non-RR terms in the appropriate 
recurrence formulas of §|5.5|: 

• Lemma 



5.7 for B: 



• Lemma 

• Lemma 



5.8 for (I(P[l,p'-l],Q[l,g'-l])-P; 



5.S for the nonintersecting pairs in E — B\ 
Lemma pug for the nonintersecting pairs in Gp — B and its symmetric 



statement for the nonintersecting pairs in Gq — B: 
5. Compute the RR(S'J :i , S\ 3 ) for all (xi,yj) G G U E U B using the appropriate 
recurrence formulas given in §5.5 and the non-RR terms computed at Step m 
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6. Compute the output as follows: 

• For all Vj S Q, RP{T 1 ,T 2 ,Q)(y J ) «- rr(S^,S^); 

• For all x t e P, RP(T 2 , Ti, P) (xi) <- RR(Sf ,Sf); 

• For every v £ K(Q, T 2 ), Rp(T l5 T 2 , Q){v) <- RP(T 2 t ', Ti, P)(ft) where ft is 
the root of TiHTJ; 

• For every u G If (P, Ti), Rp(T 2 , Ti, P)(u) <- RP(Ti", T 2 ,Q)(h) where ft is 
the root of T 2 1| T]"; 

end. 

To analyze One-One, we first focus on Step || The recurrences of §[T^ contain 
only four types of non-RR terms other than the constant in Lemma 5.7: 

1. M(X i7 yj) and M(a; i ,r i ); 

2. max ie[ilii2 ] M(Xi,yj) and max je[jl J2 j u(x it Yj); 

3. M(X t ,Y 3 ); 

4. u(Xi U {xi +1 },Yj U {y J+1 },x i+1 ,y j+1 ). 

It is important to notice that these non-RR terms can be simultaneously evaluated. 
In light of this observation, we compute these terms by using the techniques of §5.1 
to process the normal sequences Ai,A u , Bj,B v defined below: 

• Ai(j) = u(Xi,yj) for all Xi and yj. 

• Bj(i) = M(#i, Yj) for all yj and xi. 

• A u (j) = RR(S^, Si 1 ) for all u € K{P', Si) and Vj . 

• B v (i) = RR(S^,S^) for all v € K(Q', S 2 ) and x t . 

Note that Ai and A u have length q', and ^4^ is the joint of all A u where u £ Xi. 
Similarly, Bj and B v have length p', and Bj is the joint of all B v where v £ Yj. 
Lemma 5.11. 

1. The minimal condensed forms of the sequences A u and B v have a total size 
of 0(n) and can be computed in 0(n) time. 

2. The minimal condensed forms of the sequences A4 and Bj have a total size 
of 0{n) and can be computed in 0(n) time. 



Proof. Statement 2 follows from Statement 1 and Lemma 5.1. Below we only 



prove Statement 1 for A u ; Statement 1 for B v is similarly proved. We first compute 
a condensed form A u for each A u as follows: 

1. For all u £ K(P', Si), compute S 2 , u = S 2 \\Sf and Q u = Q'\\S[\ 

2. For all u where Q u is nonempty, do the following steps: 

(a) A u «- {(j,w) I yj £Q Ul w = rp(5 1 u , S 2 , Q'){y 3 )}. 

(b) Compute all tuples (v,v,yj) where v £ K(Q U , S , 2j „), v £ K(Q',S 2 ), 
v £ S 2 , and v G Yj. 

(c) Find the smallest s such that some (v, v, y s ) is obtained at Step |2t]. 

(d) If there is only one (v,v,y s ), then add to A u the pair (s, w) where w — 
rp(S?,S 2 ,Q')(v). 

3. For all u where S 2}U is nonempty and Q u is empty, do the following steps: 

(a) Compute v, v and y s where v is the root of S 2 , u , v £ K(Q' , S 2 ), v £ S 2 
and v € Y s . 

(b) A u «- {(«, w)}, where w ^RP(5r, S 2 , Q')(v). 

4. For all u where S 2 , u is empty, A u <— 0. 

The correctness proof of this algorithm has three cases. 

Case 1: Q u is nonempty. Let yj 1 ,Vj 2 , ■ • • ,Vj k = Qu- Let jo = 0. Th en, for all 
k' £ [l,k] and all j £ + l,jk>], S^WS? = S^ and by Lemma Q A u (j) = 

Rp(5", S 2 ,Q')(yk')- There arc two subcases for j > j k . 

Case la: Step pi] finds two or more (v, v, y s ). Then y s £ Q u , s — j^, and for all 
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3 e b'fe + 1, <!'], S\ 3 1 IS? is empty and A u (j) = 0. 

Case lb: Step 2b| finds only one (v,v,y 8 ). Then y s ^ Q u and s > jf.. For all 
j G [jfe + = S| u and = rp(£^, S 2 , Q')(v). For all j £ [s + l,q'}, 

S% j \\S? is empty and A u (j) = 0. 

Thus, the A u of Step g is a condensed form of A u for Case 1. 

Case 2: S^u is nonempty and Q u is empty. This case is similar to Case lb, and 
Step | computes a correct condensed form A u for this case. 

Case 3: S 2 . u is empty. This case is obvious, and Step ^ correctly computes a 
condensed form A u of A u for this case. 

The total size of all A u is at most that of the RP mappings of S\,S 2 , P' and Q', 
which is the desired 0(n). Step [j] takes 0(n) time using Fact [l[ The other steps can 
be implemented in 0(n) time in straightforward manners using radix sort and tree 
traversal. As discussed in §||, the rp mappings are evaluated by radix sort. Once the 
forms A u are obtained, we can in 0(n) time radix sort the pairs in all A u and then 
delete all unnecessary pairs to obtain the desired minimal condensed forms. □ 

Lemma 5.12. All the non-RR terms of the first two types for the pairs in GUEUB 
can be evaluated in 0(nlog(p + 1) log(g + 1)) time. 

Proof. The value of M(Xi, yj) is that of the point query ([i, i],j) for A%, ■ ■ ■ , A q >, 
and the va lue of max ie [i l i2 ] m(Xi, yj) is that of the interval query {[i\,i 2 ],j). By 
Lemma 5.5(|), there are 0(n\og(p+l)) such terms required for the pairs in GUEUB. 
Given the results of Steps ^ and || of One-One, we can determine all such terms and the 
corresponding queries in 0(n log(p+l)) time. By Lemma 5.6(5), only 0(n log(g+l)) of 
these queries are not P-regular. By Lemmas [5.11 (2) and 5.2 (2), we can evaluate these 
queries in 0(n log(p-t-l) log(q+l)) time. The terms m(xj, Yj) and maXjgUj J2 ] M(xi, Yj) 
are similarly evaluated is 0(n log(p+ 1) log(<7+ 1)) time. The analysis for these terms 
is easier because p' > q' and it does not involve the notion of C^regutarity- □ 

Lemma 5.13. The non-RR terms of the third and the fourth type for the pairs in 
G U E U B can be evaluated within the following time complexity: 

1. 0(ndlogd) or alternatively 0(nV~d\ogn) for the third type; 

2. 0(nd 2 logd) or alternatively 0(nd^/d\ogn) for the fourth type. 

Proof. To prove Statement 1, we consider the graphs (Xi,Yj, XiXYj) on which 
the desired terms M(Xi,Yj) are defined. Let Zij be the subgraph of (Xi, Yj, XiX Yj) 
constructed by removing all zero-weight edges and all resulting isolated vertices. The 
edges of Zij are computed as follows: 

1. For all u G K(P' , Si), compute S %u = S 2 \\S? and Q u = Q'\\S?. 

2. For all S2, u is nonempty, do the following steps: 

(a) If Q u is nonempty, compute all tuples (u,v,w) where v € K(Q U , S^u), 
v £ K(Q', S 2 ), v e S| and w = Rp(S?, S 2 ,Q'){v). 

(b) If Q u is empty, compute the tuple (u,v,w) where v is the root of S 2jU , 
v € K(Q',S 2 ), v € S% and w = Rp(S[ 1 , S 2 ,Q'){v). 

This algorithm captures all the nonzero- weight (u, v). At Step |[ S| u = S^WSf and 
by Lemma 3T RR^S^S^) = RP(S", S 2 , Q')(v). Thus, the first two components of 
the obtained tuples form the edges of all desired Z^j and the third components are 
the weights of these edges. We use Fact |l| to implement Step 1 in 0(n) time. We 
can implement Step in 0(n) time using radix sort and tree traversal. Note that 
Step ^ uses radix sort to evaluate RP mappings. With the tuples (u,v,w) obtained, 
we use radix sort to construct all desired Z^j in 0(n) time. Let mjj and n^j be 
the numbers of edges and vertices in Zij, respectively. Since an edge weighs at 
most n, we can compute M(Xi,Yj) in 0(nij-mij + nf ■■ \ogntj) and alternatively in 
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Statement 1 then follows from the fact that 



0(mij-^/niJ-log(ri-nij)) time | 
m t j < 2d', rii.j < 2nii,j, and by Lemma 5.5(1]) the sum of all rriij is at most n. 

To prove Statement 2, we similarly process the bipartite graphs on which the 
desired terms m(X, U {x^+i}, Yj U {yj+i},Xi+\,yj+i) are defined. The key difference 
from the third type is that in addition to some of the edges in Zij, we need certain 
nonzero- weight (u, for u € Xi and (xj+i, v) fo r k 6 Kf. Since these edges are 



required only fo r inte rsecting (xi, yj), by Lemma 5^5|(l|), 0(dn) such edges are needed 



We use Lemma 5.1l| (l) to compute the weights of these edges in 0(dn) time. Due to 



these edges, the total time complexity for the fourth type is 0(d) times that for the 
third type. □ 

The next theorem serves to prove Theorem [hl| given at the start of §|j. 
Theorem 5.14. One-One solves Problem^ with the following time complexities: 



0{nd 2 logd + 7ilog(p+l)log(g+l)), 



or alternatively 



0(ndVd\ogn + nlog(p + 1) \og(q 



!))• 



Proof. The correctness of One- One follows from Lemma 5.3 and §5. 3 -§5.5. As for 
the time complexity, Step |l] is obvious and takes 0(n) time. By computing ^f, we can 
compute the sets E and I(P[l,p' — 1], Q[l, q'—l]). Since the leaf labels of Si and S2 are 
from [1, 0(n)], each level of W can be computed in 0(n) time. Since iff has 0(log(p+l)) 
levels, E and I(P[l,p' — 1], Q[l, q' — 1]) can be computed in 0(n log(p+l)) time. With 
these two sets obtained, we can compute all the desired sets in 0(n\og(p + 1)) time. 
Thus, Step H takes 0(n log(p + 1)) time. Step || takes 0(n\og(p + 1)) time using 
radix sort. The time complexity of Step dominates that of One-One. This step 



uses Lemmas 5.12 and 5.13j and takes 0(n\og(p + 1) \og(q + 1) + nd 2 logd) time or 



alternatively 0(nlog(p+l) \og(q+l)+ndy/dlogn) time. Step || spends 0(nlog(p+l)) 
time using radix sort to create pointers from the pairs in G U E U B to appropriate 
predecessors. Step H then takes O(l) time per pair in GU E(J B and 0(n log(p + 1)) 
time in total. Step |(| takes 0(n log(p+l)) time. It uses radix sort to access the desired 
rr values and evaluate the input mappings. It also uses Fact [l] to compute all TiHT^ 
and T 2 1 IT]". □ 

6. Discussions. We answer the main problem of this paper with the following 
theorem and conclude with an open problem. 

Theorem 6.1. Let T\ and T2 be two evolutionary trees with n leaves each. Let 
d be their maximum degree. Given T\ and T2, a maximum agreement subtree of T± 
and T2 can be computed in 0(nd 2 log d log 2 n) time or alternatively in 0{nd^fd log 3 n) 
time. Thus, if d is bounded by a constant, a maximum agreement subtree can be 
computed in 0(n log 2 n) time. 

Proof. By Theorem 4.5, the algorithms in §f|-|] compute RR(Ti, T2) within the de- 



sired time bounds. With straightforward modifications, these algorithms can compute 
a maximum agreement subtree within the same time bounds. □ 

The next lemma establishes a reduction from the longest common subsequence 
problem to that of computing a maximum agreement subtree. 

Lemma 6.2. Let Mi — Xi, . . . , x n and M2 — yi, . ■ . , y n be two sequences. Assume 
that the symbols Xi are all distinct and so are the symbols yj . Then, the problem of 
computing a longest common subsequence of Mi and M2 can be reduced in linear time 
to that of computing a maximum agreement subtree of two binary evolutionary trees. 
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Proof. Given Mi and M 2 , we construct two binary evolutionary trees T\ and T 2 
as follows. Let Z\ and z 2 be two distinct symbols different from all Xi and y^. Next, 
we construct two paths P\ = u±, . . . , u n +i and P 2 = Vi,..., f n +i- T% is formed by 
making u% the root, attaching xt to leaf, and attaching z\ and z 2 to u n +i as 

leaves. Symmetrically, T 2 is formed by making z;i the root, attaching yi to Vi, and 
attaching zi and z 2 to u n +i- The lemma follows from the straightforward one-to-one 
onto correspondence between the longest common subsequences of M\ and Af 2 and 
the maximum agreement subtrees of T\ and T 2 . □ 



We can use Lemma 3.2 to derive lower complexity bounds for computing a max- 
imum agreement subtree from known bounds for the longest common subsequence 
problem in various models of computation || ||, ^3], ^9] ||(| [32], |5^]. This paper as- 
sumes a comparison model where two labels x and y can be compared to determine 
whether x is smaller than y or x — y or x is greater than y. Since the longest common 



subsequence problem in Lemma 6.2 requires f2(nlogn) time in this model [|3l|, the 
same bound holds for the problem of computing a maximum agreement subtree of 
two evolutionary trees where d is bounded by a constant. It would be significant to 
close the gap between this lower bound and the upper bound of 0(n log 2 n) stated in 



Theorem 6.1. Recently, Farach, Przytycka and Thorup |l2j] independently developed 
an algorithm that runs in 0(n\/dlog 3 n) time. For binary trees, Cole and Hariha- 
ran |Sj gave an 0(n log n)-time algorithm. It may be possible to close the gap by 
incorporating ideas used in those two results and this paper. 
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